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Abstract. Two natural decision problems regarding the XML query 

language XQuery are well-definedness and semantic type-checking. We 

[ | ■ study these problems in the setting of a relational fragment of XQuery. 

_J ' We show that well-definedness and semantic type-checking are unde- 

^ , cidable, even in the positive-existential case. Nevertheless, for a "pure" 

variant of XQuery, in which no identification is made between an item 
and the singleton containing that item, the problems become decidable. 
We also consider the analogous problems in the setting of the nested 
^— n ■ relational calculus. 

O 

\o 

O 

o 
o 



- 1—1 

X 
S3 



The author is supported by NSF Grant IIS-0082407 

Research Assistant of the Fund for Scientific Research - Flanders (Belgium) 



2 Jan Van den Bussche, Dirk Van Gucht, and Stijn Vansummeren 

1 Introduction 

Much attention has been paid recently to XQuery, the XML query language cur- 
rently under development by the World Wide Web Consortium [5,9]. Unlike in 
traditional query languages, expressions in XQuery can have an undefined mean- 
ing (i.e., these expressions produce a run-time error). As an example, consider 
the following variation on one of the XQuery use cases [7] : 

<bib> { 

for $b in $bib/book 

where $b/publisher = "Springer-Verlag" 
return element {$b/author}{$b/title} 
} </bib> 

This expression should create for each book published by Springer-Verlag a node 
whose name equals the author of the book, and whose child is the title of the 
book. If there is a book with more than one author node however, then the 
result of this expression is undefined because the first argument to the clement 
constructor must be a singleton list. 

This leads us to the natural question whether we can solve the well-definedness 
problem for XQuery: given an expression and an input type, check whether the 
semantics of the expression is defined for all inputs adhering to the input type. 
This problem is undecidable for any computationally complete programming 
language, and hence also for XQuery. Following good programming language 
practice, XQuery therefore is equipped with a static type system (based on XML 
Schema [4,18]) which ensures "type safety" in the sense that every expression 
which passes the type system's tests is guaranteed to be well-defined. Due to the 
undccidability of the well-definedness problem, such type systems are necessarily 
incomplete, i.e., there are expressions which are well-defined, but not well- typed. 

Can we find fragments of XQuery for which the well-dcfincdncss problem 
is dccidable? In this paper we will study Relational XQuery (RX), a set-based 
fragment of XQuery where we omit recursive functions, only allow the child axis, 
take a value-based point of view (i.e., we ignore node identity), and use a type 
system similar to that of the nested relational or complex object data model [1, 
6, 19]. We regard RX as the "first-order database fragment" of XQuery. 

Even for RX, the well-definedness problem is still undecidable, due to two 
features which allow us to simulate the relational algebra: quantified expressions 
and type switches. Surprisingly, however, well-definedness remains undecidable 
for RX without these features, which we call positive-existential RX or PERX 
for short. 

The core difficulty here is due to the fact that in the XQuery data model 
an item is identified with the singleton containing that item [11]. In a set-based 
model this identification becomes difficult to analyze, since {i,j} is a singleton 
if and only if i — j. Since, as shown in the example above, there are expressions 
which are undefined on non-singleton inputs, this implies that in order to solve 
the well-definedness problem, one also needs to solve the equivalence problem. 
Indeed, we will see that the equivalence problem for PERX is undecidable. 
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Nevertheless, for a "pure" variant of PERX, in which no identification is 
made between an item and the singleton containing that item, well-dcfincdness 
becomes decidable. We actually prove this result not for pure PERX itself, but 
for PENRC: the positive-existential fragment of the nested relational calculus 
[6,19], which is well-known from the complex object data model, and whose 
wcll-dcfincdncss problem is interesting in its own right. 

All our results hold not only for well-dcfincdness, but also for semantic type- 
checking: given an expression, an input type and an output type, check whether 
the expression always returns outputs adhering to the output type on inputs 
adhering to the input type. 

In the main body of the paper we will work in a set-based data model. 
Considering that the real XML data model is list-based, at the end of the paper 
we will discuss how and if our results transfer to a list-based or bag-based setting. 

Related work The semantic type-checking problem has already been studied ex- 
tensively in XML-related query languages [2, 3, 13-15, 17]. In particular, our set- 
ting closely resembles that of Alon et al. [2, 3] who, like us, study the problem in 
the presence of data values. In particular they have shown that (un)decidability 
depends on the expressiveness of both the query language and the type system. 
While the query language of Alon et al. can simulate PERX, our results do 
not follow immediately from theirs, since their type system is incompatible with 
ours [16]. 

2 Relational XQuery 

In what follows we will need to define various query languages. In some definitions 
it will help to talk abstractly about a query language. To this end, we define a 
query language Q as a tuple (V,T, E, [[.]) where V is a set of values; T is a set 
of types; E is a set of expressions; and [.] is the interpretation function giving 
a semantics to types and expressions. The set V is also referred to as the data 
model. 

We assume to be given an infinite set X = {x,y, ...} of variables. Every 
expression e has associated with it a finite set FV(e) C X of free variables. 
An environment on e is a function a : FV(e) — ► V which associates to each 
x G FV(e) a value o~{x) G V. A type assignment on e is a function _T : FV(e) — > T 
which associates to each x G FV(e) a type r(x) G T. If p is an environment (or 
a type assignment), and v is a value (respectively a type), then we write x : v, p 
for the environment (respectively type assignment) p' with domain dom(p)U{x} 
such that p'(x) = v and p'{y) = p(y) for y ^ x. Intuitively, environments describe 
the input to expressions, and type assignments describe their type. 

Every type t is associated with a set [r] of values. An environment a is 
compatible with a type assignment r, denoted by a G r, if they have the same 
domain and a(x) G [^(x)] for all x. Every expression e has associated with it 
a (possibly partial) computable function [e] which associates environments on 
FV(e) to values in V. We call [e] the semantics of e. 
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In order not to burden our notation we will identify types and expressions 
with their respective interpretations, and write for example e(er) for [e](er). 

2.1 Relational XQuery data model 

In this section we define a set-based fragment of the XQuery data model [11] 
called the Relational XQuery (RX) data model. We take a value-based point of 
view (i.e., we ignore node identity), focus on data values, element nodes and 
data nodes (known as text nodes in XQuery), and abstract away from the other 
features in the XQuery data model such as attributes. 

We assume to be given a recursively enumerable set A — {a, 6, . . . } of atoms. 
An item is an atom or a node. A node is either an element node {a : N) or a data 
node (a) , where a & A and N is a finite set of nodes (N is called the content of 
the element node). An RX-value, finally, is any finite set of items. 

An RX-type r is a term generated by the following grammar: 



r : 

I : 
v : 
1- 



coll(i) | single(t) 
atom v | l U l 
data | elem(7) | v U v 
coll(^) | single(^) 



Here, r ranges over types, i ranges over item types, v ranges over node types, 
and 7 ranges over node content types. An RX-type denotes a set of RX-values: 

— data denotes the set of all data nodes, 

— elem(7) denotes the set of all element nodes (a : N) for which TV is a finite 
set over the denotation of 7, 

— atom denotes the set A of all atoms, 

— i\ U i2 denotes the union of the denotations of i\ and 11, 

— coll(t) denotes the set of all finite sets over the denotation of 1, and 

— single(t) denotes the set of all singletons over the denotation of 1. 

Note that every 7 is also a r, and hence the denotation of terms produced by 7 
is subsumed in the definition above. 

An RX-kind k is a term generated by the following grammar: 

K ::= atom | data | elem kUk 

An RX-kind denotes a set of items, which can be the set of all atoms, the set of 
all data nodes, the set of all element nodes, or the union of the denotations of 
two kinds. 

Discussion The type system we have defined above is quite simple. Types merely 
indicate the many-or-one cardinality of a value, and the kinds of items that can 
appear in it. Only values of a fixed maximal nesting height can be described in 
our type system. This is justified because the expressions in the XQuery fragment 
RX we will work with in this paper can look only a fixed number of nesting levels 
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down anyway. Also, it is a public secret that most XML documents in practice 
have nesting heights at most five or six, and that unbounded-depth nesting is 
not needed for many XML data processing tasks. 

The presence of the single type constructor is justified by the fact that an 
item i is identified with the singleton set {i} in the XQuery data model [11]. 
Consequently, an XQuery expression in which the input is always expected to 
be a string actually receives singleton strings as inputs. Its input type would 
therefore be single(atom) in our setting. 

Our types also do not specify anything about the names of element nodes, 
but this is an omission for the sake of simplicity; we could have added node types 
of the form elem a (7) , with a the atom that must be the name of the element 
node, without sacrificing any of the results we present in this paper. 



2.2 Relational XQuery syntax and semantics 

A Relational XQuery expression is an expression generated by the following 
grammar: 

e ::= x 

text{e} J elem{e}{e} \ data{e) \ nameie) \ children (e) 

() I e, e I for x : k in e return e 

if e eq e then e else e if e = then e else e \ if e £ r then e else e 

Here, e ranges over RX-exprcssions, x ranges over variables, r ranges over RX- 
types and n ranges over RX- kinds. The free variables of e are defined in the 
usual way, and will be denoted by FV(e). 

The semantics of RX is parameterized by two "oracle" functions: 

— content, which maps element nodes to atoms; and 

— concat, which maps finite sets of atoms to atoms. 

We further define the following (partial) functions on values: 

— data(v) = {a \ a £ v} U {a | (a) G v} U {content((a : N)) \ (a : N) £ u}, 

— name(v), which is {a} if v is a singleton element node {(a : N)}; concat(v) 
if v is empty; and undefined otherwise. 

— children(v), which is undefined if there is some atom in v, and otherwise 
returns 

\J{N I (a : N) £ v}. 

— construct(v, w) which is undefined if data(v) is not a singleton atom {a}; and 
returns {a : N) otherwise, where N is obtained from w by replacing every 
atom in w by a corresponding data node: 

N = {(a) J a £ w} U {i | i £ w, i is a node}. 
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Let e be an RX-cxpression and let a be an RX-environment on e. 3 The 
semantics e(a) of e under a can now be inductively denned as follows: 

x(o~) = <j{x) 
text{e} = {(concat(data(e(o~)))}} 

e/era{ei}{e 2 }(cr) = { construct (e\ (a), e 2 (<r))} 

data(e)(a) = data(e{o~)) 

name(e)(o~) = name(e(o~)) 

children(e)(a) = children(e(o~)) 

()(a) =0 

ei,e 2 (u) = ei(cr) Ue 2 (er) 

for x : k in e\ return e 2 — I J{e 2 (x : {«}, <r) | i G ei(er) n k} 

{e3(cr) if rfato(ei(cr)) = data(e2(o~)) = {a}, 
with a an atom 
e4(er) if data(e\(a)) = {a}, data(e2(o~)) = {b}, 
with a, 6 atoms, a ^b 

ft (h ^ 1 \( \ j e 2( (J ) if e l(°") = 

(z/ ei = It) t/ieri e 2 e/se e3j(crj = < 

Ie3(cr) otherwise 

ft r- +h 1 \( \ J e 2( (T ) lf e l( a ) ^ T 

(if ei £ r then e 2 else e^)(a) = < 

I e 3( cr ) otherwise 



Note that e(o~) is not necessarily defined: this models the situations in which 
XQuery expression evaluation produces a run-time error. Specifically, e(o~) can 
become undefined for the following reasons: 

— e = efem{ei}{e 2 }, and data(ei(<r)) is not a singleton atom. (This can only 
happen if e\(o~) is not a singleton.) 

— e = name(e'), and e'(o~) is not a singleton element node. 

— e = children(e'), and e'(a) contains an atom. 

— e = if ei eq e 2 then e^ else e^, and data(e\(a)) is not a singleton atom, or 
rfafa(e 2 (cr)) is not a singleton atom. (This can only happen if e\(o~) respec- 
tively e 2 (<r) is not a singleton.) 

Relation to XQuery The RX query language corresponds to a set-based version 
of XQuery [5,9] where we have omitted recursive functions, literals, arithmetic 
expressions, generalized and order comparisons, and only allow the children axis. 
We have replaced XQuery quantified expressions by the emptiness test (which 
is equivalent in expressive power) , and have moved kind tests from XQuery step 

3 Recall from the beginning of this section that a assigns an RX-value to each free 
variable of e. 
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expressions to the "for" expression. As an example, the XQuery step expression 
%x/ child :: textQ can be expressed in RX as 

for z : data in children{x) return z. 

The "oracle" functions concat and content model features which are present 
in XQuery, but which are clumsy to take into account in our data model. For 
example name applied to the empty set returns the empty string in XQuery. 
Furthermore, applying data to a singleton element node in XQuery returns the 
"string content" of the node. This is (roughly speaking) a concatenation of all 
atoms (converted to strings) encountered in a depth-first left-to-right traversal 
of the node's content. 



3 Well-definedness and semantic type-checking 

As we have noted in Section 2.2, the semantics e(a) of RX-exprcssion e under 
environment a can be undefined. This leads us to the following definition. 

Definition 1. The well-definedness problem for a query language Q consist of 
checking, given a Q-expression e and a Q-type assignment T on e: whether e(a) 
is defined for every a £ T. In this case we say that e is well-defined under T. 

A problem which is related to well-definedness is the semantic type-checking 
problem: 

Definition 2. The semantic type-checking problem for a query language Q con- 
sist of checking, given a Q-expression e, a Q-type assignment T on e such that 
e is well-defined under T, and a Q-type t: whether e(o~) G r for every a G T '. In 
this case we say that r is an output type for e under T . 

4 Undecidability results 

We will show that well-definedness for RX is undccidable, even for a quite re- 
stricted fragment. Our results do not depend on the particular interpretation 
given to the oracle functions concat and content. 

Let us begin by defining RX~ as the fragment of RX where 

— we disallow data node construction expressions of the form text{e}; 

— we disallow data extraction expressions of the form data(e); and 

— we disallow kind tests, or equivalently, we only allow the use of the single 
"universal" kind atom U data U eleni. 

An RX~ -expression e is positive existential if it does not contain emptiness 
tests of the form if e\ =0 then e^ else e^, or type switches of the form if e\ G 
r then e2 else e^. We denote the language of all positive-existential RX~ expres- 
sions by PERX~ , and we will mention specific features added back to PERX~ in 
square brackets. Thus, PERX~ [empty] includes emptiness tests, and PERX - [type] 
includes type switches. 



8 Jan Van den Bussche, Dirk Van Gucht, and Stijn Vansummeren 

Proposition 1. PERXT [type] is equivalent to RX~ ; in other words, type switches 
can be used to simulate emptiness tests. 

Indeed, if e\ = then ei else e-i can be expressed as follows: 

if (for x in e\ return elem{a}{()}) G coll(data) then ei else e% 

The following proposition is not surprising, and parallels earlier results on 
semistructured query languages such as StruQL [10]: 

Proposition 2. PERX~ [empty] can simulate the relational algebra. Concretely, 
for every relational algebra expression </> over database schema S, there exists a 
PERX~ [empty]- expression e^ and a type assignment Tg, such that 

— e^ is well-defined under Ts, and, 

— e^ evaluated on an encoding of database D equals an encoding of 4>(D). 

The simulation is described in Appendix A. 

Consequently, satisfiability (i.e., nonempty output on at least one input) is 
undecidable for PERX - [empty] (and thus for RX~), because it is undecidable 
for the relational algebra. Since the expression 

for x in e return elem {()}{()} 

is well-defined if, and only if, e is unsatisfiable, we obtain: 

Corollary 1. Well-definedness for PERX~ [empty] (and thus RX) is undecid- 
able. 

What is perhaps more surprising is that without emptiness test, we remain 
undecidable: 

Theorem 1. Well-definedness for PERX~ is undecidable. 

Proof (Crux). The proof goes by reduction from the implication problem for 
functional and inclusion dependencies, which is known to be undecidable [1,8]. 
Let S be a set of functional and inclusion dependencies, and let p be an inclu- 
sion dependency. We show in Appendix B that we can construct two expressions 
ei and ei, a type assignment P and a node content type 7, such that 

— ei and ei are well-defined under P , 

— 7 is an output type for e\ and ei under P ', and, 

— ei(cr) = ei{a) for every a € r if, and only if, p is implied by S. 

Consequently, the expression name (elem {a} {ei}, elem{a\{ei\) is well-defined 
under r if, and only if, p is implied by S. □ 

As a corollary to the proof, we note: 

Corollary 2. Equivalence of PERX" expressions is undecidable. 
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We further derive: 

Corollary 3. Semantic type-checking for PERX~ is undecidable. 

Indeed, referring to the above proof sketch of Theorem 1, e\ and ei are equivalent 
if, and only if, {elem{a\{e{\, elem{a}{e2}) has output type single(elem(7)). 

We remark that to establish undecidability of well-definedness we do not need 
singleton types. For undecidability of semantic type-checking, we do. 

5 Pure RX 

In the XQuery data model, an item i is identified with the singleton {i} [11]. 
With this identification, it is indeed natural to let, e.g., name(e) be undefined 
when e(er) is a set with more than one element. As we have seen in the previous 
Section, it is exactly this behavior that causes well-definedness to be undecidable. 

So let us define a version of RX, called pure RX, which does not explicitly 
identify an item i with {i}. We will show in Section 6 that well-definedness 
and semantic type-checking for the positive-existential fragment of pure RX is 
decidablc. 

A pure RX-value is an item or a set of items. A pure RX-type r is a term 
generated by the following grammar: 



= COll(fc) l | T U T 

= atom | v | i U i 

= data | elem(^i U • • • U v^) 



Here, r ranges over types, i ranges over item types, v ranges over node types, 
and k > 0. 

A pure RX-type denotes a set of pure RX- values: 

— data denotes the set of all data nodes, 

— elem(^i U • • • U v^) denotes the set of all element nodes (a : N) for which N 
is a finite set over the union of the denotations of v\ , . . . , v^. 

— atom denotes the set A of all atoms, 

— ii Ui2 denotes the union of the denotations of t\ and ti, and, 

— coll(t) denotes the set of all finite sets over the denotation of i. 

Note that since every i is also a r, the denotation of t\ U ii is subsumed by the 
definition above. 

The syntax of pure RX is obtained from the syntax of RX by adding a 
singleton constructor expression (e), and by replacing RX- types in type switch 
expressions by pure RX types. 

In order to give the semantics of pure RX, we define the following (partial) 
functions on pure RX-values. 

— data (v) — {a | a G v} U {a | (a) G v} 
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— name' (v), which is a if v is an element node (a : N), and is undefined oth- 
erwise. 

— children (v), which undefined if there is some atom in v, and otherwise re- 
turns 

\J{N \{a:N)E v}. 

— construct' (v , w) which is undefined if v is not an atom, and returns (v : N) 
otherwise where N is obtained from w by replacing every atom in w by a 
corresponding data node: 

N = {(») | a E w} U {i | i E w, i is a node} 

The semantics of pure RX is then defined as follows: 

x(a) = cr(x) 

text {e} (a) = (a) if e(a) = a 

e/em{ei}{e 2 }(cr) — construct! (ei(er), e 2 (<7)) 

data(e)(o~) = data (e(<r)) 

name(e)(a) = name' (e(o~)) 

children(e)(o~) = children (e(<r)) 

()W=0 

(e)(a) = {e(er)} if e(u) is an item 
ei,e 2 (a) = ei(<j) Ue 2 (er) 

for x : k in e\ return e 2 = I J{e 2 (x : i,u) i E ei(u) fl k} 



(«/ei eq e 2 i/ien e3 e/se ei)(a) = 
(if ei = i/ien e 2 e/se e3)(u) = 
(i/ei E t i/ien e 2 e/se e3)(u) = 




if ei (u), e 2 (cr) E A and e\(a) = e 2 (a) 
if ei(ij), e 2 (a) E .4 and ei(u) ^ e 2 (cr) 

ifei(cr)=0 
otherwise 

if ei(cr) E r 
otherwise 



Note that again e(o~) is not necessarily defined. Specifically, e(er) can become 
undefined for the following reasons: 

— e = text{e'}, and e'(o~) is not an atom, 

— e = e/em{ei}{e 2 }, and ei(cr) is not an atom, 

— e = nome(e'), and e'(er) is not an element node, 

— e = children(e') 1 and e'(cr) contains an atom, 

— e = (e'), and e'(cr) is not an item, 

— e = ei, e 2 , and ei(<r) is not a set or e 2 (<r) is not a set, or, 

— e = if e\ eq e 2 then e-$ else e^, and ei(er) or e 2 (er) is not an atom. 
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Pure PERX 



Clearly, well-defincdness and semantic type-checking for the entire pure RX re- 
mains undccidablc due to the presence of the emptiness test and type switch 
expressions. Let us define pure PERX as the fragment of pure RX in which 
these expressions are disallowed. 

6 Decidability results 

In this section we will show that well-defincdncss and semantic type-checking for 
pure PERX are decidable. In fact, we will solve the corresponding problems for 
the nested relational calculus (NRC): the well-known standard query language 
for nested relations and complex objects. Indeed, this language remains funda- 
mental and its study remains interesting in its own right. As we will see, pure 
PERX can be simulated by the positive-existential fragment of NRC (extended 
with kind-tests). 

6.1 Nested relational calculus 

An NRC-value is either an atom, a pair of NRC-values, or a finite set of NRC- 
values. Note that we allow sets to be heterogeneous. If v — (t>i, 1*2)1 then we 
write tti(v) for vi and iT2{v) for v-i- 

An NRC-type r is a term generated by the following grammar: 

r ::= I atom | r x r | r U r | coll(r) 

An NRC-type denotes a set of NRC-values: 

— denotes the empty set, 

— atom denotes the set A of all atoms, 

— n x T2 denotes the cartesian product of the denotations of n and Ti , 

— n U T2 denotes the union of the denotations of n and T2, and, 

— coll(r) denotes the set of all finite sets over the denotation of r. 

An NRC-kind k is a term generated by the following grammar: 

K ::= atom | coll kx k|kUk 

An NRC-kind denotes a set of NRC-values, which can be the set of all atoms, 
the set of all finite sets of values, the cartesian product of the denotation of two 
kinds, or the union of the denotation of two kinds. 

The positive existential nested relational calculus (PENRC) is the set of all 
expressions generated by the following grammar: 

e ::= x 

I (e,e) I 7ri(e) | 7r 2 (e) 

I I {e} I eUe | (J e | {e | x e e} 
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Here e ranges over expressions, and x ranges over variables. The PENRC with 
kind tests, denoted by PENRC [kind] is the PENRC extended with one additional 
expression: 

e ::= • • • | e G K ? e : e 

Here, k ranges over NRC kinds. The free variables of e are defined in the usual 
way, and will be denoted by FV(e). 

If e is a PENRC [kind]-cxpression and a is an NRC-environmcnt on e, then 
the semantics e(cr) of e under a is inductively defined as follows: 

x(a) = a{x) 

(ei,e 2 )(er) = (ei(cr), e 2 (er)) 

7ri(e)(cr) =7Ti(e((7)) 

7r 2 (e)(cr) = 7r 2 (e(cr)) 

0(a) = 

{e}(a) = {e(a)} 

(ei U e 2 )(cr) = ei(er) U e 2 (cr) 

(|J e )( ( 7)=U e ( CT ) 

{e 2 \x e ei}(cr) = {e 2 (x : v,a) \ v G ei(<j)} 

Je 3 (cr) if ei(<j),e 2 (<j) G ^l and ei(cr) = e 2 (cr) 
(ei = e 2 : e3 : e^jya) ~ < 

^e 4 (cr) if ei(cr),e 2 (cr) G ^l and ei(cr) ^ e 2 (a) 

/ ^ 9 x/ x J e 2(c) if ei(o-) G K 

(ei G k ? e 2 : e 3 ){a) = i 

I e3(cr) otherwise 

Note that e(a) can be undefined. For example tti(x)(<t) is undefined when a(x) 
is not a pair, and (a; U y)(<j) is undefined when u{x) is not a set. Hence, we can 
also study the well-dcfinedness problem for PENRC [kind]. 

It is easy to see that well-dcfinedness for full NRC: PENRC extended with an 
emptiness test, is undecidable. Indeed, it is well known that NRC can simulate 
the relational algebra [6] . 

6.2 Simulating RX in NRC 

Formally, a simulation of a query language Q in a query language Q' is a function 
enc : Vq — ► Vqi such that 

— for every type r G Tq there exits a type r' G Tq/ such that w G r if and only 
if enc(v) G r', and 

— for every expression e G Eq there exists an expression e' G Eqi such that 

1. e(a) is defined if and only if e'(enc(a)) is defined, and 

2. if e(a) is defined, then enc(e(cr)) = e'{enc{o~)). 
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A simulation is effective if t' can be computed from r and e' can be computed 
from e. 

Lemma 1. Pure PERX can be effectively simulated in PENRC[kind]. 

Proof (Crux). Consider the encoding function enc for which 

enc(a) = a enc((a)) — ((a, a), 0) 

enc((a : N)) = (a, enc(N)) enc{v) = {enc(i) i G v} 

Then enc is an effective simulation. It is easy to find r' by induction on r. 
Furthermore, as shown in Appendix C, e' can be constructed by induction on e. 
To illustrate this, we first introduce some syntactic sugar: we write e\ G n — > C2 
for ei G k ? ei : 7Ti(0). Intuitively, this expression will be used to verify that the 
input to e' is an encoding of a legal input to e. Otherwise, we become undefined. 
We can now for example simulate (e) by e' G atom — ► ((e',e'),0). We can 
simulate (ei : e-i) by 

e[ G atom — ► (e' l7 {x G atom ? ((x,x), 0) : x | x G e^}). 

And we can simulate childrenie) by Ul^st^) | a; G e'}. D 

Corollary 4. J/ the well-definedness or semantic type-checking problem is de- 
cidable for PENRC[kind\, then it is also decidable for pure PERX. 

6.3 Well-definedness for pure PENRCfkind] 

Consider the following expression: 

e = {{z = yl 7ri(z) : y \ y G x}) | x G R} 

and let the environment a be defined by 

a(R) = {{a,b},{c},{a,b,d}}} a(z) = d. 

Since there is a set in o~{R) which contains o~{z), we will need to evaluate 7ri(cr(z)) 
at some point, which is undefined. Hence, e(cr) is undefined. Note that we do 
not need all elements in cr(R) to reach the state where e(a) becomes undefined. 
Indeed, e is also undefined on the small environment a' where cr'(R) = {{(i}} 
and o~'(z) = d. 

We generalize this observation in the following general property, proven in 
Appendix D. Here, we say that an environment a is in the set £k if every set 
occurring in cr(x) has cardinality at most k for every x G dom(o~). 

Lemma 2 (Small model property for undefinedness). Let e be a PENRC\kind] 
expression, let r be a type assignment on e, and let a be an environment com- 
patible with r such that e(cr) is undefined. There exists a natural number I which 
can be computed from e alone, and an environment a 1 G £i compatible with r , 
such that e(a') is also undefined. 
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We obtain: 

Corollary 5. Well-definedness problem for PENRC\kind\ is decidable. 

Indeed, up to isomorphism (and expressions cannot distinguish isomorphic in- 
puts) there are only a finite number of different input environments in £; com- 
patible with r. So we can test them all to see if there is a counterexample to 
well-defincdncss. 

Also for semantic type-checking we have: 

Lemma 3 (Small model property for semantic type-checking). Let e be 

a PENRC[kind\ expression, let r be a type assignment on e such that e is well- 
defined under r , and let t be a type. Let a be an environment compatible with 
r such that e(u) $ r. There exists a natural number I which can be computed 
from e and r alone, and an environment a' £ £i compatible with r , such that 
also e(a') ^ r. 

Corollary 6. Semantic type-checking for pure PENRC[kind\ is decidable. 

6.4 Equivalence and satisfiability 

The above decidability results are quite sharp, because equivalence of PENRC 
expressions is undccidable. This can be proven in a similar way as Theorem 1. 
Of course, containment is then also undecidable. Levy and Suciu [12] have 
shown that a "deep" form of containment (known as simulation) is decidable 
for PENRC. 

Another important problem is satisfiability. For example, the XQuery type 
system generates a type error whenever it can deduce that an expression which 
is not the empty set expression itself always returns the empty set. As noted 
in Section 4, satisfiability is undecidable for PERX - [empty] . For pure PERX, 
and PENRC [kind], however, satisfiability can be solved using the small model 
property for semantic type-checking. Indeed, a PENRC[kind] expression e is 
satisfiable under r if, and only if, coll(0) is an output type for e under P. We 
point out that, at least for PENRC without union and kind-tests, decidability 
of satisfiability already follows from the work of Levy and Suciu cited above. 

7 Lists and bags 

In this paper we have focused our attention on a set-based abstraction of XQuery. 
The actual data model of XQuery is list-based however, and hence it is natural 
to ask how our results transfer to such a setting. 

Let us denote by RX lst the list-based version of RX, which can be obtained 
from RX as follows. The list-based RX data model is obtained by replacing 
"set" in the definition of the RX data model by "list" . The list-based semantics 
of an expression is obtained from the set-based semantics by replacing every set 
operator by the corresponding list operator (i.e., empty set becomes empty list, 
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union becomes concatenation, and so on). We can similarly define the bag-based 
version of RX, which we will denote by RX ag . 

We can still simulate the relational algebra in the list- and bag-based ver- 
sions of PERX~ [empty] and PERX~ [type]. Hence, well-dcfmcdness and semantic 
type-checking for these languages is undecidable. It is an open problem however 
whether well-defmcdncss and semantic type-checking in the list- and bag-based 
versions of PERX - remains undecidable. Indeed, our undecidability proof de- 
pends heavily on the fact that set union is idcmpotent, which is not the case for 
list concatenation and bag union. 

We can also consider a list-based and bag-based version of PENRC[kind], 
to which our decidability results transfer. Hence, well-dcfincdness and semantic 
type-checking are decidable for pure PERX hst and pure PERX bag . 
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A Relational Algebra Simulation 

Proposition 3. The relational algebra can be simulated in PERX~ [empty] un- 
der any interpretation of the "oracle" functions concat and content. Concretely, 
for every relational algebra expression <f> over database schema S, there exists a 
PERX~ [empty]- expression e and a type assignment r, such that 

— e is well-defined under r ', and, 

— e evaluated on an encoding of database D equals an encoding of(p(D). 

Proof. In order to focus on the crux of the proof we will assume below that 
atoms are expressions in PERX - [empty] . At the end of the proof we illustrate 
how we can get rid of this assumption. 

There exists a well-known encoding of tuples as unordered trees. For example, 
the tuple {A\ : a\, . . . , A n : a n ) can be encoded as the element node 

(T:{{A 1 :{a 1 }),...,(A n :{a n }}}). 

Here we assume without loss of generality that attribute names are atoms. A 
relation can then be encoded as the set of the encodings of its tuples. 

We assume without loss of generality that relation names are variables. Let 
S be a database schema, and let <f> be a relational algebra expression over S. A 
database D over S can be encoded as an environment a such that o~(r) is an 
encoding of D(r) for every relation name r. 

We say that an expression e with FV(e) = FV((j>) simulates 4> if e(o~) is an 
encoding of 4>(D) whenever a is an encoding of D. We construct an expression 
e^ which simulates <f> by induction on as follows: 4 

— If <f> = r then e^ = r. 

— If <j) — ga 1= a 2 (VO then we define e^ to be 

for t in e,/, return 
for X\,X2 in children (t) return 
if name{x\) = A\ and name(x2) = Ai 
and children{x\) = children (x^) then 
t 
else 



— If cf> — ka-l A n {ip) then we define e^ to be 

for t in e,/, return 
elem{T}{ 

for x in children(t) return 



4 Here, we allow to bind multiple variables in one for loop, and also allow boolean 
combinations of conditions in an if test. Both features can clearly be simulated in 



PERX - [empty]. 
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if name(x) — A\ or ... or name(x) = A n then 

x 
else 



} 

— If <j) = "01 x 02 then we define e^ to be (the schema's of 0i and 02 are 
disjoint): 

for t\ in e^j , t% in e^, 2 return 
e I em {T}{ children^), children^)} 

— If <j> = Pa 1 /a 2 {' 1 I ; ) then we define e^ to be 

for t in e^ return 
elem{T}{ 

for x in children(t) return 

if name(x) = A\ then elem{A2}{children(x)} else a; 
} 

— If 4> — 0i U 02 then we define e^ as e^ , e^ 2 . 

— If cj) = 0i — -02 then we define e^ as follows. 

for ti in e^j return 
if / = then ti else () 

Here, the subexpression / returns if t\ $■ e^ 2 , and {t\} otherwise. Let the 
schema of X and 2 be {Ai, . . . ,A n }, then / is constructed as follows: 

for t 2 in e^ 2 return 

for Xi,...,x n in t\ return 
for yi,..., y„ in t 2 return 

if name(xi) — A\ and ... and name(x n ) = A n 

and name{yi) = A\ and . . . and name{y n ) = A n 

and children(xi) = children(yi) 

and . . . 

and children(x n ) = children(y n ) then 
t 

else 



Note that e^ is well-defined under every input a which is an encoding of some 
D. Furthermore, on such a, there is never need to evaluate concat or content 
because: 

— name is always evaluated on a singleton element node, and 
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— every if-test is between singleton data nodes (and hence, the application of 
data on the values which are to be compared can be calculated without using 
content). 

Let r be the type assignment for which for every r S FV(<f>) 

r(r) = coll(elem(coll(elem(single(data))))). 

Note that a is not necessarily an encoding of a database D when a is com- 
patible with r. Hence, e^ is not necessarily well-defined under r. However, we 
can transform every o~(r) into an input which is an encoding of D(r) for some 
database D as follows. Let the schema S(r) = {Ai, . . . , A n }. 

for f in r return 
for Xi,...,X n in t return 
if name(x) = A\ and . . . and name(x n ) = A n then 

elem{T}{xi, . . . ,x n } 
else 



Note that this expression does not change a(r) if it is already a valid encoding. 
Let e be the expression obtained by replacing every r in e^ by the expression 
above. It is easy to verify that now e is well-defined under r. 

Note We have assumed in the construction above that constant atoms (such 
as Ai,... ,A n ) arc expressions. This makes the construction easier, but is not 
really necessary. We first note that we could have chosen any atom to represent 
an attribute Ai, as long as it differs from the atoms representing other attributes. 
Hence, we can create for every attribute name Ai thus used a variable XA t with 
type single(atom). An environment a is now an encoding of a database D if 
(1) cr(r) is an encoding of D(r) (as before), and (2) a{xA i ) ^ <t(xa,) when i ^ j. 
Let e! be the expression obtained from e by replacing Ai by xa { ■ Using if-tests 
we can then evaluate e' if condition (2) holds, and evaluate () otherwise. □ 

B Well-definedness in PERX~ 

In this section we give the full proof that well-dcfincdncss for PERX~ is unde- 
cidablc. 

Theorem 2. Well-definedness for PERX~ is undecidable. 

Proof. In order to focus on the crux of the proof we will assume below that 
atoms are expressions in PERX~ [empty] . At the end of the proof we illustrate 
how we can get rid of this assumption. 

We give a reduction from the implication problem for functional and inclusion 
dependencies, which is known to be undecidable [1,8]. We assume without loss 
of generality that the database schema consists of a single relation symbol with 
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schema {^4i, . . . , A k }. A functional dependency is a rule X — ► Y where X and 
Y are subsets of {A\, . . . , A k }- We say that relation R satisfies <j), denoted by 
R |= 4> if f° r au tuples ti,t 2 G R if ttx(£i) = 7rx(*2) then also 7iv(£i) = nyfa)- 
An inclusion dependency ip is a rule of the form [i?i, . . . , Bj] — > [Ci, . . . , Cj] 
where {i?i, . . . , -Bj} and {Ci, . . . , Ci} are subsets of {A\, . . . , A k }- We say that 
relation R satisfies ip, denoted by R \= ip if ^B 1 ....,B k {R) Q ^c 1 ,....c k {R)- 

Let JC be a set of functional and inclusion dependencies. Let <pi ,...,</>„ be the 
functional dependencies in Z\ and let ?p\, . . . ,1pm be the inclusion dependencies 
in S. Let p be an additional target functional dependency. The implication 
problem consists of checking for every relation R over {Ai, . . . , A^} that if R 
satisfies every dependency in S, then R also satisfies p. 

We assume without loss of generality that A\,...,Ak are atoms. We will 
encode a fe-tuple {A\ : oi, . . . , A^ : au) by the element node 

(A 1 :{(A 1 :{a 1 }),... 1 {A k :{a k })}). 

A relation R is then encoded as the set of the encodings of tuples in R. Let r be 
a variable, and let r be the type assignment for which 

r(r) = coll(elem(coll(elem(single(data))))). 

Note that cr(r) is not necessarily an encoding of some relation R when a is 
compatible with r. For example, suppose that k > 2, and consider 

cr(r) = {{A, : {(^ : {(a)})}}}. 

Then a e _T, but there is clearly no instance of R for which a(r) is an encoding. 
However, we can transform every input into an input which is an encoding 
of some relation: 5 

for t\ in r return 
for xi,...,Xk in t\ return 
if name(xi) — A\ and ... and name(xk) — A k then 

elem{Ai}{xi,. ..,x k } 
else () 

The output of this expression is always a set of element nodes which are en- 
codings of fc-ary tuples. In the remainder of this proof we will therefore assume 
w.l.o.g. that r contains an actual encoding of some relation r: we can always 
replace r by the expression above. 

For every functional dependency <p = {Si, . . . , Bi\ — » {C\, . . . , Cj}, we create 
the following expression e^: 

for t\,ti in r return 
for xi, . . . ,Xi,yi, . . . , yj in t\ return 



5 Here, we allow to bind multiple variable in one for loop, and also allow boolean 
combinations of conditions in an if test. Both features can clearly be simulated in 
PERX~. 
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for iti, • • • , Ui, i>i, . . . , V j in ti return 
if name(x\) = B\ and . . . and name(xi) = Bi 
and name(yi) = C\ and . . . and name{yj) = Cj 
and name{ui) — B\ and ... and name(ui) = Bi 
and name(vi) — C\ and ... and name(vj) — Cj then 
if children(xi) = children(ui) 
and . . . 

and children(xi) = children(ui) 
and (children(yi) ^ children(v\) 
or . . . 

or children(yj) ^ children(vj) 
) then elem{Ai}{()} else () 
else () 

Note that on input a for which tr(r) is an encoding of relation R this expression 
returns if R |= <p, and (Ai : 0} otherwise. 

For every inclusion dependency ip — R : [B\ , . . . , Bj\ C S* : [Ci , . . . , C'i] we 
create the following expression e^: 

for t\ in r return elem{Ai}{ 
for i2 in ^ return 
for Xi,...,Xi in ti return 
for yi,...,yi in i 2 return 
if name(x\) = B\ and . . . and name(xi) = Bi 
and name(yi) — C\ and . . . and name(yi) = Ci 
and children(xi) — children(yi) 
and . . . 

and children(xi) — children(yi) 
then elem{Ai}{()} else () 
}, elefli{Ai}{elem{i4i}{()}} 

Note that on input er for which u(r) is an encoding of relation R, this expression 
returns {(^i : {(A x : 0))}} if R \= 0, and {(Ai : {(Ai : 0)}), (A x : 0)} otherwise. 
Let Do, ... , D n ,E\, . . . , E m be different elements of A We then create the 
expression ei as follows: 

elem{Ai}{ 

elem {D Q }{e p } , elem {Di}{e ( j >1 } , . . . , elem {D n }{e ( j >n } , 
elem {£'i}{e^ 1 } , ..., elem {E m }{e^ m } 

} 

Let fi, . . . , f n ,h be expressions of the form () or elem {Ai} {()}} , and let 
<7i, . . . , <? m be expressions of the form (elem {Ai} {()} , eZem{j4i}{e/em{Ai}{()}}), 
or efem{ii}{efem{4i}{()}}. We call an expression 

elem{Ai}{ 

elem {D }{h} , elem {Di}{fi} , . . . , elem {£>„}{/„}, 
elem {Ei}{gx}, ..., elem {E m }{g m } 

} 
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admissible when, if /i = ••• = /„ = () and 171 . . . g m — elem {Ai}{elem {Ai} {()}}, 
then h = (). Clearly the number of admissible expressions is finite. Let {ii, . . . , ik} 
be the set of all admissible expressions. Then we define e2 as the expression 

^1 ; ■ ■ ■ j ^n- 

Note that e\ and e^ are well-defined under T '. Furthermore, the following 
node content type is an output type for e\ and e-i under P: 

7 := coll(elem(coll(elem(coll(elem(coll(elem(coll(data))))))))). 

Also, S \= p if, and only if, ei(er) C e2(cr) for every uGf. Hence, S \= p if, and 
only if, (ei,e2)(c) = e2(cr) for every cr G -T. Consequently, 

e := name(elem{Ai}{ei U 62}, eZem{j4i}{e2j-) 

is defined if and only if S \= p. Furthermore, elem{Ai}{e\ U 62}, etem{Ai}{e2j-) 
has output type single(elem(7)) if, and only if S \= p. 

Note We have assumed in the construction above that constant atoms (such 
as A\, . . . , A n ) are expressions. This makes the construction easier, but is not 
really necessary. We first note that we could have chosen any atom to represent 
an attribute Ai, as long as it differs from the atoms representing other attributes. 
Hence, we can create for every attribute name Ai thus used a variable XA t with 
type single(atom). An environment a is now an encoding of a relation R if (1) 
a(r) is an encoding of R (as before), and (2) (j(xa) 7^ v(xb) when A ^ B. Let 
e' be the expression obtained from e by replacing every A by xa- Using if-tests 
we can then evaluate e' if condition (2) holds, and evaluate () otherwise. □ 

C Simulating pure PERX in PENRCfkind] 

In this section we give the full proof of the simulation of PERX in PENRC[kind] . 

Lemma 4. The pure PERX can effectively be simulated in PENRC[kind]. 

Proof. 

enc(o) = a enc({a)) = ((a, a),0) 

enc({a : N)) = (a, enc(N)) enc(v) = {enc(i) i G v} 

We will now verify that enc has the required properties. Let t be a pure 
RX-type. We define the pure NRC-type t' by induction on r as follows. 



atom 

data' 

elem(^i U • • • U v^) 

coll(t) 

(riUr 2 ) 



= atom 

= (atom x atom) x coll(0) 

= atom x coll(i^ U • • • U v' k ) 

= coll(t') 

= r[\Jr' 2 
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Obviously, v G r if, and only if enc(v) G t' . If k is a pure PERX-kind, then we 
define the PENRC-kind k! by induction on k as follows: 

atom' = atom 

data' = (atom x atom) x coll 
elem' = atom x coll 

(«i U K2) 1 — k[ U k' 2 

Obviously, v G k if, and only if, enc(v) G k' . 

Let e be a pure PERX-expression. We will define the pure PENRC-expression 
e' by induction on e as follows. We first introduce some syntactic sugar: we write 
ei G k — > e 2 for ei G n ? e 2 : 7Ti(0). Intuitively, this expression will be used 
to verify that the input to e' is an encoding of a legal input to e. Otherwise, we 
become undefined. 



x 



x 



(e)' := e' G atom -> ((e',e'),0) 
(ei : e-i)' := e[ G atom — > 

(e' l7 {x G atom ? ((x, x),0) : x | x G ej}) 

data(e)' := M{a; G (atom x atom) x coll ? {7Ti(7Ti(x))} 

: x G atom ? {x} : | x G e'} 
name(e)' := 7Ti(e ) G atom — » 7Ti(e ) 

children{e)' := I J{7T2 (a?) | x G e'} 

()':=0 
(e)' := e' G atomU((atom x atom) x coll) U (atom x coll) — > {e'} 

(ei,e 2 )':=e' 1 Ue^ 

(/or x : k in e\ return e2)' := M{a; G k' ? e' 2 : | x G e[} 
(if ei eg e 2 i/ien e3 efee 64) := ej = e 2 ? e 3 : e 4 

A straightforward induction on e now shows that 

1. e(a) is defined if, and only if e'(enc(a)) is defined, and 

2. if e(er) is defined then enc(e(a)) = e'(enc(a)). D 

D Well-definedness for PENRCfkind] 

In this section we give the proof of the small model properties mentioned in 
Section 6.3. Whenever we write "expression" we mean PENRC[kind]-expression, 
and whenever we write type we mean PENRC-type. 
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Let A: be a natural number. We write Vk for the set of all values v for which 
the cardinality of a set occurring in v is at most k: 

V k = A U (V fc x V fc ) U {v c V k | |v| < k}. 

We write £k for the set of all environments a for which the cardinality of a set 
occurring in a is at most k: 

£k = {o~ | cr environment on some X and <r(a:) € Vk for all a; G X}. 

The sub-value relation C on values is inductively defined by the following 
inference rules: 

«C»' t«[ti) for all Uj there exists w, such that v% Q Wj 



a Co (v,w)Q(v',w') {vi, . . . ,v n } E{tOi, • • • , w m } 

This relation can be extended component-wise to environments: let a and a' be 
two environments on the same set of variables X, then a\—a' if a(x) C a' (a;) for 
all x e X. 

If u, v, w are values, and uQw and i> C w, then we define uUwas follows: 

aUo = a {ui,U2,) \-i{wi, W2) = (ui Llwi, ui UW2) vUw = vUw 

Lemma 5. If u\—w and v\—w, then ulZuUv, v\—u\Av, and uUv \Zw. More- 
over, if u G Vk and v G Vi, then hUdG Vk+i- 

Proof. By a straightforward induction on w. 

We first give some characteristics of C. with regard to kinds and types. 

Lemma 6. If vQw then v G K if and only if w G K. 

Proof. By a trivial induction on k. 

Lemma 7. Let r be a type. If »C«i a?i(i w G t, then v G r. 

Proof. By a trivial induction on r. 

The PENRC has the following monotonicity properties. The proof is by a 
straightforward induction. 

Lemma 8 (Monotonicity). Let e be an expression, and let a and a' be envi- 
ronments on FV (e) such that a IZ a' . If e{a) ande(a') are defined, thene(o~)Qe(o~'). 
If e(o~) is undefined, then so is e(cr'). 

In what follows we will frequently take the minimum min(w) of a value v, 
which is obtained by replacing every set occurring in v by the empty set. For 
example, min(({a, 6}, (a, {{c, d}}))) = (0, (a, 0)). Obviously, min(w) G Vb and 
min(w) C v. 
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Before we are ready to state our small model properties we need one final def- 
inition. Let e be an expression, and let fc be a natural number. The k-complexity 
c(e, k) of e is inductively denned as follows: 

c(x, k) = k 

c((ei, e 2 ), k) = c(ei U e 2 , fc) = c(ei, k) + c(e 2 , A:) 

c(7ri(e'), k) = c(^ 2 (e'), fc) = c((J e') - c(e', fc) 

c({e'},k) = kxc(e',k) 

c(0,fc) = O 

c({e 2 | x £ ei}, A;) = c(ei, max(fc, c(e 2 , A;))) + fc x c(e 2 , k) 

c(ei = e 2 ? e3 : e4, fc) = max(c(e3, fc), c(e4, fc)) 

c(ei G /c ? e 2 : e 3 , fc) = max(c(e 2 , fc), c(e3, fc)) 

Lemma 9 (Small model). Let k be a natural number, let e be an expression, 
and let a be an environment on FV(e) such that e(a) is defined. Let u G Vfe. 
IfuQe(o~), then there exists an environment a 1 G £ c (e,k) with a' IZ a such that 
wCe(cr')- 

Proof. Note that since e(a) is denned, e(8) is also denned for every 6 C. a by 
monotonicity. Moreover, if o~iQcr and <7 2 Ccr, then <7i C ci U cr 2 , c 2 IZ o^ U cr 2 , 
and ei U <t 2 Z cr by Lemma 5. We will use these facts silently throughout this 
proof. 

In order to write the proof in a succinct manner, let us define the set 
P(u, e, a, k) by 

P{u, e, a, k) = {a | a' G £ c (e,k) , cr' C cr, and u Z e(a')}. 

We will prove by induction on e that P(w, e, cr, fc) is non-empty. 

— If e = x, then we define a' by 

I min(cr(y)) otherwise 

— If e = 0, then we take a' — min(u). 

— If e = (ei,e 2 ), then e(a) is a pair. Hence, u — (ui,u 2 ) for some wi,u 2 G 
Vfe. By the induction hypothesis there exist o\ G P(ui,e\,a, fc) and er 2 G 
P(w 2 ,e 2 ,CT,fc). Then ctiUct 2 G £ c (ei,k)+c(e 2 ,k) = £c(e,k)- Then, by monotonic- 
ity: 

(u 1 ,u 2 )C(e 1 (cr 1 ),e 2 (cr 2 ))C(e 1 (cri Ucr 2 ),e 2 (cr 1 Uct 2 )) = e(<Ji Uct 2 ) 

Hence, a\ U cr 2 G -P(m, e, <r, fc). 
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— If e = ei U e2, then e(cr) is a set. Since u Q e(a) there exists w v G e(a) with 
w C w v for every v E u. Define 

mi = {v g u | w v g ei(u)} 
w,2 = {« g u | w v e e 2 (er)} 

Then u = Ui U U2, U\ Qei(<r), and Ui ^62(0"). Moreover, Ui, «2 G Vfe. The 
result then follows from the induction hypothesis by a reasoning similar to 
the previous case. 

— If e = 7Ti(e'), then e'(cr) is a pair (v, w). Let w' = (u, min(w)). Then v! E(v, w) 
since sCo and min(w) Ce Moreover, u' G Vfc since min(w) G Vo- Hence 
there exists a' G P(u', e', cr, fc) by the induction hypothesis. Hence, u = 
^i(u')^TTi(e'(a')) — e(cr'). Since £ c ( e ',fc) — £c(e,k)i a ' G P(u,e,a,k). The 
case where e = 7T2(e') is similar. 

— If e = {e'} then we discern two cases. If u = 0, then «Ce((/) for any 
cr' IZ a by monotonicity. Hence, it suffices to take a' = min(er), which is in 
£q. Otherwise, u contains at least one and at most k elements. Let v G u. 
Then »eVt, and v C e'(er) since u Q e(a). By the induction hypothesis there 
exists <j v G P(v,e',a,k). Let a' = \_\ v< z u crv Then cr„ Ccr' for every u G u, 
and cr' C cr. By monotonicity we then have 

vQe'(a v )Qe'(a'). 

And hence u Q e' (a 1 ). Moreover, a 1 G £kxc(e',k) = c ( e 7 &) by Lemma 5. Hence, 
a' G P(w, e, er, fc). 

— If e = (J e' then e'(cr) is a set of sets. For every v G u there exists w v G e(a) 
such that u C w^ since u C e(c). 

Let e'(cr) = {zi, . . . , z n }. Define 

Ui = {v G u I w„ G Zi \ [J Zj}. 

j<i 

Note that the cardinality of each of the itj's is at most k, and that at most 
k of the Mj's are non-empty. Furthermore, Uj Qzi. Let m' be the set of all 
non-empty Uj's. Then u'C.e'(a) and it' G Vfe. The result then follows from 
the induction hypothesis. 

— If e = ei = e2 ? e3 : e4, then ei(cr), e2(cr) G A Suppose ei(cr) = e 2 {<j\ then 
u C 63(a). By the induction hypothesis there exists a' G P(u, es, cr, k). Then 
ei(cr') = ei(cr) = e2(<r) = e2(<r') by monotonicity and hence e(cr') = 63(17'). 
We then have by the induction hypothesis that u\—e^(a') = e(cr'). Since 
<t' G £ c (e 3 ,k) Q £c(e,k)i a ' G P(u,e,a,k). The case where ei(cr) ^ 62(17) is 
similar. 

— If e = ei G k ? e2 : e3, we discern two cases. If ei(er) G K then uCe2(c). 
By the induction hypothesis there exist a' G P(u, e 2 , cr, k). By monotonicity, 
e i( <J ') E ei(cr). Hence, ei(cr') G k by Lemma 6. Then e(cr') = e2(c'), and 
hence uEe2(cr') = e(a'). Then a' G P(u,e,a,k) since cr' G £ c (e 2 ,k) Q £c(e,k)- 
The case where ei(cr) ^ k is similar. 
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— If e = {e2 | x € ei}, then e(u) is a set. Let v G w. Since u C e(<r) there exists 
w„ G e(cr) such that v Q w v . Since e(o~) is obtained by a comprehension over 
ei(cr), there also must exists some z v G ei(er) such that vCw„ = e2{x : 
z v , a). Hence, there exists x : z' v , a' v G P(v, e2, (x : z v , a), k) by the induction 
hypothesis. Let u' = {z' v \ v G u}. Then u' contains at most k elements of 
V c (e 2 ,fe)- Hence, u' G V max ( fc , c ( e2! fc))- Moreover, x : z' v , a' C x : z v , a by the 
induction hypothesis, so z' v Cz„, and hence it' C ei(cr). 

By applying the induction hypothesis again, there exists o~\ G P(u' , ei, <r, A:). 
Let ct' = ci Liy^g^cr^. Note that eriCer', and ct(, C ct' for every v £ u. 
Furthermore, u'Cu and the maximum cardinality of a set in a' is bounded 
by (Lemma 5): 

c(ei, max(fc, c(e2, A:))) + /c x c(e2, fc) = c(e, A;) 

Now u' tei(cri) Cei(cr') by monotonicity. Hence, for every z(, there exists 
some z'l G ei(o-') with z' v \—z". Then a; : z' v ,a' v ^x : z",o', and hence 
vQe2(x : z' v , a' v ) Qe2(x : z",tr') by monotonicity. Since this holds for ev- 
ery v G u, we have wCe(cr'). □ 

Lemma 10. Let e 6e an expression and a an environment on FV(e). If e(o~) is 
undefined, then there exists a' G £ c (e.i) with a' \—(j such that e(a') is undefined. 

Proof. The proof is by induction on e. We note again that if e'(cr) is defined, 
then e'{8) is also defined for every 5 C a by monotonicity. 

— If e = x, or e = 0, then there is nothing to prove, since e(er) is always defined. 

— If e = (ei, e2), then either ei(cr) or e2(c) is undefined. The result then follows 
by the induction hypothesis. The case where e = {e'} is similar. 

— If e = 7Ti(e'), then either e'(cr) is undefined, in which case the result follows 
from the induction hypothesis, or e'(cr) is not a pair. Let a' = min((r). By 
monotonicity e'(a') cannot be a pair, and hence e(cr') is also undefined. 
Moreover, a' G So C £ c ( e .i). 

— If e = ei Ue2, then either ei(er) is undefined, e2(er) is undefined, ei(<r) is not 
a set, or e2(cr) is not a set. In the first two cases the result follows from the 
induction hypothesis. In the third cases, let a' = min(cr). By monotonicity 
ei(cr) cannot be a set, and hence e(cr') is undefined. Moreover, a' G £o Q 
£ c ( e !). The last case is similar. 

— If e = U e ': then either e'(<r) is undefined, or e'(er) is not a set of sets. 
In the first case the result follows from the induction hypothesis. In the 
latter case we have two possibilities. If e'(er) is not a set, then let a' = 
min(er). By monotonicity, e'(cr') cannot be a set, and hence e{o~) is undefined. 
Moreover, a' G £q C £ c ( el y If e'(er) is a set, but not a set of sets, then 
there exist some u G e'(er) that is not a set. Then {min(w)} G Vi, and 
{min(u)} Q e'(o~). By Lemma 9 there exists a" G £ c (e'.i) = ^c(e,i) with a" Q a 
such that {min(w)} Ce'(cr"). Hence, e'(a) is not a set of sets, and e(er") is 
also undefined. 



28 Jan Van den Bussche, Dirk Van Gucht, and Stijn Vansummeren 

— If e = ei = e2 ? eg, : e^, then we have three possibilities. If ei(cr) or e^icr) 
is undefined, then the result follows from the induction hypothesis. If ei(cr) 
and e2(<r) are defined and ei(cr) = ti{a), then e3(er) must be undefined. By 
the induction hypothesis, there exists a' G £ c (e 3 ,i) !== ^-c(e.i) with u'Cu such 
that e^{a') is also undefined. By monotonicity ei(er') = e2(cr'), and hence 
e(er') is undefined. If ei(er) 7^ 62(c) the reasoning is similar. 

— If e = ei G k ? e2 : eg, then we have three possibilities. If ei(cr) is undefined, 
then the result follows from the induction hypothesis. If ei(cr) is defined and 
ei(er) £ k, then &i(o) must be undefined. By the induction hypothesis we 
have a 1 G £ c (e 2 ,i) ?= ^c(e,i) with er'Cer such that &2{o~') is still undefined. 
By monotonicity, ei(cr') Cei(cr), and hence ei(cr') G k by Lemma 6. Hence, 
e(cr') = e2(<r') which is undefined. If ei(cr) is defined, but e\(p) G" k the 
reasoning is similar. 

— If e = {e2 I x G ei}, then we have three possibilities. 

1. If ei(cr) is undefined, then the result follows from the induction hypoth- 
esis. 

2. If ei(cr) is defined, but is not a set, then let a' = min(<7). By monotonicity, 
ei(er') cannot be a set, and hence e(er') is undefined. Moreover, a' G 

£c(ei,l) ^j £c(e,l)- 

3. Otherwise, e\(a) is defined and a set, but there is some v G ei(cr) 
such that e^{x : w, a) is undefined. By the induction hypothesis, there 
exists x : u,<72 G £ c ( e2 .i) with x : u,a2 < Qx : v,a such that 62(2; : 
u,<J2) is undefined. Then {u} G V max (i iC ( e2 ,i)), and {u}Qei(a). By 
Lemma 9 there exists a\ G £ c (ei,max(i,c(e 2 ,i))) with ctiCct such that 
{u\ Qei(ai). Let o - ' = o\ UCT2. Note that cri \—a' and 172 Ec'- By mono- 
tonicity {«} Eei(er'). Hence, there exists some u' G ei(u') such that 
uQu. Then x : Uj^Ci : u',a', and e^(x : u',a') is undefined by 
monotonicity of undefincdness. Hence, e(a') is undefined. Moreover, a' G 

£c(ei,max(l,c(e2,l))+c(e 2 ,l)) = ^c(e,l)- ^ 

Corollary 7 (Small model for undefinedness). Lef e 6e an expression, let 
r be a type assignment on e, and let a be an environment compatible with r such 
that e(a) is undefined. There exists a natural number I which depends only on e, 
and an environment a' G £1 compatible with r such that e(a') is also undefined. 

Proof. By Lemma 10, there exists a' G £ c (e,k) such that cr'Ccr and e(a') is 
undefined. Moreover, a' G F since a G r by Lemma 7. □ 

Functions mapping atoms to atoms can naturally be lifted to functions de- 
fined on values by extending them element- or components-wise. Indeed, let 
/ be a function mapping atoms to atoms, then / can be lifted to values by 
taking f((v,w)) = (f(v)J(w)), and f({v,...,w}) = {f(v),...,f(w)}. Also, 
functions defined on values can be lifted to functions on environments by taking 
f(a)(x) = f(a(x)). 

It is easy to show by induction that the PENRC is generic. 

Lemma 11 (Genericity). Let e be an expression, let a be a environment on 
FV(e), and let p be a permutation of A. If e(a) is defined, then e(p(a)) = 
p(e{a)). If e(a) is undefined, then so is e(p(o)). 
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Theorem 3. Let e be an expression and let r be a type assignment on FV(e). 
It is decidable to check whether e is well-defined under r. 

Proof. Suppose that e is not well-defined under I\ Then there exists some a 
compatible with r such that e(a) is undefined. By Lemma 10 there exists some 
a' G £ c (e,i) compatible with r with a' C. a such that e(cr') is undefined. 

Let k be a natural number, and let t be a type. Let us denote the maximum 
number of atoms a value in r (~l Vfc can mention by rank(r n Vfc). Then 



rank(A n Vfc) = 1 
rank((ri x T2) fl Vfc) = rank{j\ fl Vfc) + rank(r-2 fl Vfc) 
rank{{T\ U T2) H Vfc) = max(rank(Ti n Vfc), rank(ji n Vfc)) 
renfe(coll(r') n Vfc) = fc rank(r' n Vfc) 

Consequently, the maximum number of atoms an environment in £ c (e,i) compat- 
ible with r can mention is bounded by 

1= ^ rank(r{x)r\V c ( e s)). 

x£FV(e) 

Note that I depends only on e and r. Let A — {a\ 1 . . . , a{\ C A.. Let us denote 
the atoms occurring in an environment 5 by A(8). Since the number of different 
atoms occurring in a' is at most I, there exists a permutation p of A such that 
A(p(a')) = p(A(a')) C A. By genericity, e(p(a')) is also undefined. 

Hence, in order to check if e is well-defined under r it suffices to enumerate 
all environments 7 compatible with r that mention only atoms in A, and check 
whether e{"j) is defined. There arc only a finite number of such 7, from which 
the result follows. □ 

Lemma 12. If v is a value and v $■ t, then there exists a natural number k and 
u G Vfc such that uQv and u g' r. 

Proof. Let us define the complexity c(t') of a type r' as follows. 

c(A) = 
c(ti x t 2 ) = max(c(ri),c(r 2 )) 
c(n Ut 2 ) = c(ti) + c(T2) 
c(coll(r')) =max(l, c(t')) 

Let v be a value with v ^ t. We show that there exists a value u G V c i T \ with 
m C v such that u G' r by induction on r. 

— If r = ^4, then take u = v. 

— If r = (n, T2), then u = («i, i)2 ) and either vi £ n, or W2 ^ T2- We can apply 
the induction hypothesis in both cases. 
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— If t — n U T2, then v ^ T\ and w G" T2. By the induction hypothesis there 
exist u\ G c(ti) and U2 G C ( T 2) with ui C w and W2 E ^ such that ui G" n and 
U2 ^ T2. Take u = uiU«2, and suppose u E t. Then cither u G n, or u G T2. 
If u £ Ti, then also u\Qu would have to be in n by Lemma 7, which is a 
contradiction. If u G T2, then also «2E« would have to be in T2, which is 
also a contradiction. Hence, u $ r. Moreover, u G 14( Tl ) +c ( T2 ) = V C ( T ). 

— Finally, if r = coll r'), then there exists some 1/ G v such that u' g - r'. By the 
induction hypothesis there exists v! G V c( y) such that ti'Ct)' and u' G" r'. 
Then {«'} C v and {«'} ^ r. D 

Corollary 8 (Small model for semantic type-checking). Let e be an ex- 
pression, let r be a type assignment on e such that e is well-defined under r , 
and let r be a type. Let a be an environment on e such that e(o~) g" t. There 
exists a natural number I which depends only on e and r , and an environment 
er' G £1 compatible with r such that also e(a') G" r . 

Proof. Since e(<r) G - t, there exists a natural number k and a value u G V& with 
uQe(a) such that u G" r by Lemma 12. By Lemma 9, there exists a' G £ c (e,k) 
such that a' \—o~ and itCe(cr'). Since u ^ r, e(cr') is also not in r by Lemma 7. 
Since er G -T, also er' G F by Lemma 7. □ 



